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(54) Calculation of DFE coefficients from CIR 

(57) A method for equalizing a receiver signal in a 
digital receiver with the aid of a DFE (decision feedback 
equalizer) (2) including one orseveral FF (feed-forward) 
filters (4) and a FB (feed-back) filter (6), comprises a 
process based on displacement structure theory for 
solving a set of equations, which optimum filter coeffi- 
cients for the DFE (2) in terms of the CIR (channel im- 



pulse response). First, the FF filter coefficients (14) are 
computed, and then the FB filter coefficients (16) are 
computed by convolting the FF filter coefficients (14) 
with the CIR. The circuit means for performing the meth- 
od comprise a linear chain of CORDIC type processing 
elements 

(21 ,22,23,31 ,32,33,34,41 : 42,43,50,61 ,62,63,64,65,70, 
81,82,91). 
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Description 
Technical Field 

[0001 ] The invention relates to a method and a circuit arrangement for equalizing a receive signal in a digital receiver 
with the aid of a DFE (decision feedback equalizer). 

Prior Art 

[0002] In communications systems which use transmission media such as radio or electric power-lines, the trans- 
mission medium exhibits time-dispersive propagation. This effect is typically caused by multipath propagation. Where 
time dispersion is present and the signaling rate of the system is high, the receivers in the system need to compensate 
for it. This can be done by using a DFE. 

[0003] A DFE comprises FF (feed-forward) and FB (feed-back) filters and a decision device. The coefficients of the 
FF and FB filters in the DFE must be correctly chosen. Their optimum values depend on the exact nature of the time 
dispersion in the medium, which will in general change over time. The time dispersion at a given time may be charac- 
terized byHhe CIR (channel impulse response). 

[0004] The current invention concerns itself with systems where the CIR is known in the receiver. If the CIR is known, 
the optimum coefficients for the DFE can be calculated from it. [WO 00127083] describes a set of equations for defining 
the optimum DFE coefficients in terms of a known CIR, where optimization is performed with respect to Minimum Mean 
Square Error (MMSE). These equations define the optimum for a certain class of modulation schemes. Equations 
defining the optimum coefficients for other modulation schemes and other assumptions (e.g. about noise power spectral 
density or number of receiver antennas) may be obtained from standard texts, such as e.g. [J, G. Proakis: Digital Com- 
munications!. 

[0005] Before the DFE can be used, the appropriate one of these sets of equations has to be solved for the DFE 
coefficients AH such sets of equations have a very similar structure and can be similarly solved. 
[0006] Known solutions (i.e. methods and/or circuit arrangements for solving such sets of equations) are rather com- 
plex and slow Such a solution is described e.g. m[Kailath, T. and A. H. Sayed, 1995: Displacement Structure: Theory 
and Apptica tions; SI A M Review 3 7(3), 297-386]. 

Summary of the Invention 

[0007] It is an object of the present invention to provide a method and a circuit arrangement for equalizing a receive 
signal in a digital receiver with the aid of a DFE structure, said method and circuit arrangement enabling to improve 
performance and/or reduce computational complexity in the receiver. 

[0008] This object can be achieved with a method and/or a circuit arrangement defined by the independent claims. 
[0009] According to the invention, a method for equalizing a receive signal in a digital receiver with the aid of a DFE 
includes a process based on displacement structure theory for solving a set of equations defining optimum filter coef- 
ficients for the DFE in terms of a CiR of the receiver. During execution of said process first the FF filter coefficients are 
computed and then the FB filter coefficients are computed by convolving the FF filter coefficients with the CIR. 
[0010] Fig. 1 depicts a schematic diagram of a known DFE. The diagram of Fig. 1 is valid also for the DFE used for 
the invention. The DFE consists of as many FF filters as there are channel outputs per transmitted symbols. The outputs 
of the FF filters are summed and the ISI (intersymbol interference) from the past decided symbols are subtracted by 
the FB filter. The resulting signal is fed to a decision device. Both filters are most often implemented as linear transversal 
FIR (finite impulse response) filters. 

[001 1] Instead of directly computing the equalizer coefficients from the known training symbols, it is computationally 
attractive to rollow an indirect path by first computing the CIR estimate and then the optimal equalizer coefficients from 
the CIR estimate. The channel is assumed to be time invariant over the course of a single packet. 
[0012] Often the channel coherence time can be assumed to be much larger than the packet transmission time, so 
the DFE coefficients could stay unchanged for multiple packets. But since in a carrier sense multiple access scenario 
it is not clear a priori which transmitter originated a packet, the receiver must be able to compute new DFE coefficients 
for every packet. 

[0013] Several methods are known to derive optimum filter coefficients by first computing FB filter coefficients and 
then computing FF filter coefficients from the FB filter coefficients by using a procedure known as back substitution. 
In [Al-Dhahir and Cioffi, 1995, MMSE Decision-Feedback Equalizers: Finite-Length Results. IEEE Transactions on 
Information Theory 41(4), 961 -975] equations for the filter coefficients are derived by writing the equalization problem 
of a whole block in matrix form, using a finite CIR put into a fully windowed Toeplitz matrix and performing a QR 
factorization procedure for computing the optimum FB coefficients. The FF filter coefficients are then computed from 
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the FB filter coefficients by back substitution. In [Kailath, T. and A. H. Sayed, 1995: Displacement Structure: Theory 
and Applications; SIAM Review 37(3), 297-386] a process is proposed based on displacement structure theory for 
solving a set of equations defining optimum filter coefficients for the DFE in terms of a CIR of the receiver. Again the 
FF filter coefficients are computed from the FB filter coefficients by back substitution. 

[001 4] In Contrast to this prior art methods however, according to the method of the invention the FF filter coefficients 
are calculated first, and then the FB filter coefficients are computed by convolving the FF filter coefficients with the 
CIR. As compared to back substitution the method according to the invention is much better suited to parallel VLSI 
implementation. In fact, it can be performed using the FF filter hardware of the DFE. 

[0015] According to a first aspect of the invention a circuit arrangement for equalizing a receive signal in a digital 
receiver comprises a DFE and circuit means for performing a process based on displacement structure theory for 
solving a set of equations defining optimum filter coefficients for the DFE in terms of a CIR of the receiver. The circuit 
means are configured and arranged such that they first compute the FF filter coefficients and then compute the FB 
filter coefficients byconvolving the FF filter coefficients with the CIR. It goes without saying that the circuit arrangement 
according to this aspect of the invention is suited for performing the aforementioned method of the invention. 
[0016] According to another aspect of the invention a circuit arrangement for equalizing a receive signal in a digital 
receiver comprises a DFE and circuit means for performing a process for solving a set of equations defining optimum 
filter coefficients for the DFE in terms of a CIR of the receiver. The circuit means comprise a linear chain of CORDIC 
(Coordinate Rotation Digital Computer) type processing elements. The circuit arrangement according to this aspect of 
the invention is well adapted for performing the aforementioned method of the invention. However it is also suited for 
performing other methods for equalizing a receive signal in a digital receiver with the aid of a DFE, which include a 
process based on displacement structure theory for solving a set of equations defining optimum filter coefficients for 
the DFE in terms of a CIR of the receiver. 

[0017] Computing the DFE FF filter coefficients from the CIR requires the solution of a structured system of linear 
equations. It is known to use a linear chain of processing elements of the multiply-accumulate type to solve such block 
near-to-Toeplitz systems. In contrast to this known circuit arrangement the circuit arrangement according to one aspect 
of the present invention comprises a linear chain of processing elements using CORDIC blocks (also designated as 
CORDIC type processing elements). CORDIC type processing elements are advantageous for VLSI implementation^ 
over multiply-accumulate type processing elements. With N f designating the number of FF filter coefficients, the VLSU 
architecture based on CORDIC type processing elements according to the invention computes the solution in 0(N f ) 
time and requires only 0(N f ) processing elements. 

[0018] Preferably the circuit means comprise a master CORDIC block and a slave CORDIC block, said master 
CORDIC block receiving at a control input afirst status signal and being switched between a vectoring mode of operation 
and a rotation mode of operation depending on the first status signal. When performing the above mentioned method 
according to the invention, the first status signal may indicate the state of processing the first row of the matrix G or B. 
The master CORDIC block may produce a second status signal and transmit it to a control input of the slave CORDIC 
block, which is being switched between a rotating mode of operation with a first direction of rotation and a rotation 
mode of operation with a second rotation of direction depending on the second status signal. 

[0019] The circuit means may further comprise a FIFO memory means arranged in a feedback loop with respect to 
the linear chain of CORDIC type processing elements. This allows for reducing the number of CORDIC type processing 
elements used for the circuit arrangement according to the invention. 

[0020] From the following detailed description and from all the claims as a whole it will be clear to a person skilled 
in the art, that there exist more advantageous embodiments and combinations of features of the invention. 

Brief description of the drawings 

[0021] The drawings used for illustration of the examples show: 
Fig.. 1 a schematic diagram of a DFE; 

Fig. 2 a schematic diagram of a channel model of a communication system, for which the method and/or the circuit 
arrangement according to the invention are to be used; 

Fig. 3 a schematic diagram of the general architecture of a circuit arrangement for computing the DFE FF filter 
coefficients according to a first embodiment of the invention; 

Fig. 4 a schematic diagram of the general architecture of a circuit arrangement for computing the DFE FF filter 
coefficients according to a second embodiment of the invention; 
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Fig. 5 a schematic diagram of the general architecture of a circuit arrangement for computing the DFE FF filter 
coefficients according to a third embodiment of the invention; 

Fig. 6 a schematic diagram of two CORDIC blocks of a PE; 

Fig. 7 a schematic diagram of a CORDIC microrotation slice of the CORDIC blocks depicted in Fig. 6; 

Fig. 8 a schematic diagram of a PE for N D = 1 , N c = 2, real numbers, white noise, of a circuit arrangement for 
computing DFE filter coefficients according to the invention; 

Fig. 9 a schematic diagram of a PE for N D = 1 , N Q = 2, real numbers, colored noise, of a circuit arrangement for 
computing DFE filter coefficients according to the invention; 

Fig. 10 a schematic diagram of a PE for N D = 1, N Q = 1 , complex numbers, white noise, of a circuit arrangement 
for computing DFE filter coefficients according to the invention; 

Fig. 11 a schematic diagram of a PE for N D = 1 , N 0 = 1 , complex numbers, colored noise, of a circuit arrangement 
for computing DFE filter coefficients according to the invention; 

Fig. 12 a detailed diagram of a PE for N D = 1 , N Q =2, complex numbers, white noise, of a circuit arrangement for 
computing DFE filter coefficients according to the invention; 

Fig. 1 3 a schematic diagram of a simplified PE for N D = 1 , N Q = 2, real numbers, white noise, two passes per iteration 
of a circuit arrangement for computing DFE filter coefficients according to the invention; 

Fig. 14 a graphical representation of the results of computer simulation of residual ISI vs. number of microrotations; 

Fig. 15 a graphical representation of the results of computer simulation of residual ISI vs. wordlength; 

Fig. 16 a schematic diagram of a circuit arrangement with minimum latency architecture for DFE filter coefficient 
computation according to the invention; 

Fig. 17 a schematic diagram of a circuit arrangement with single PE architecture for DFE filter coefficient computation 
according to the invention. 

[0022] In principle in the drawings the same objects are given the same reference signs. 

Ways of carrying out the invention 

System Model 

[0023] Fig. 2 depicts the channel model of a communication system, whose transmission medium exhibits time- 
dispersive propagation , which is compensated for by the method and the circuit arrangement according to the invention . 
For each symbol generated by the transmitter dj, the channel outputs N Q values that are linear combinations of trans- 
miner symbols corrupted by stationary gaussian noise n^. i denotes the discrete time index. (•)* denotes elementwise 
complex conjugation, (•)* transposition, and (•)" hermitian transposition. E[] denotes the expectation operator. Cj = 
(Cj(°> ... C|< N o- 1 >)T denotes the channel impulse response coefficients, r x = (x<S>) ... r,< N 0- 1 >) T the received signal vector at 
time index i, and n { = (n^ 0 ) ... nj( N 0* 1 >) T the added noise vector. 



[0024] The channel output is given by (I). The reason for treating the more complex single input multiple output 
(SIMO) channel is the possibility to cast several practically interesting configuration into this framework. The T/2 spaced 
DFE may be treated within this framework (N Q = 2). 




(!) 
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[0025] Normally, all involved signals are complex. Some important signal constellations, such as binary phase shift 
keying (BPSK) and minimum shift keying (MSK) employ a real signal constellation (dj e (RR). In this case, one might 
want to minimize only the real part of the error energy at the decision point, because the decision device is going to 
throw away the imaginary part anyway [WO 00/27083]. This configuration may be treated as an all real system with 
two channel outputs (N Q = 2). A 
[0026] Figure 1 depicts the Decision Feedback Equalizer 2/\ denotes the decision point signal, dj the recovered 
estimate of the transmitted signal N f the number of FF filter coefficients 1 4, and N b the number of FBfilter coefficients 1 6 . 
[0027] It is well known that the optimal FF filter coefficients 1 4 according to the MMSE criterion can be computed by 
solving the linear system of equations Ax = y (11.1), where 



A = 



(0-0) C 0 C I + N(1-0) 

c,c T 0 + N <(M) c;c7 + c' 0 cl + N (1 .„ 
c* 2 c5 + N (0 _ a c 2 c] + c\c J Q + N (1 . a 



c'.c] +c 0 cj +N (2 . 1 , 



(11.2) 



the noise Nj.j = E|nj*nj T ] is assumed to be stationary, 



X = 



(III) 



and 



y = 



(IV) 



A is a block matrix of size N f x N f where the blocks are of size N Q x N Q . A possesses hermitian symmetry. The FB 
coefficients 1 6 



C k + A-i'j 



1<k<N h 



(V) 



can be computed by convolving the CIR with the FF coefficients 14. 
Method according to conventional displacement structure theory 

[0028] For the purpose of a better understanding of the method based on a modified displacement structure theory 
according to the invention, which is described later, first a known method for solving the set of equations as defined 
by equations II - IV is described, which known method is based on conventional displacement structure theory. 
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[0029] The matrix A is highly structured. Each element is the sum of its north west neighboring element plus an 
additional term. This structure can be exploited to simplify the inversion by using the Displacement Structure Theory 
[Kailath, T. and A. H. Sayed, 1995: Displacement Stnjcture: Theory and Applications; SIAM Review37(3), 297-386]. 
In this subsection the basic displacement structure algorithm is reviewed. It factors the matrix A into its lower triangular 
Cholesky factors L, i.e. A = LL*, in an order recursive fashion. The idea is to use a "compressed" representation of A 
and to perform the Cholesky recursion on that representation. 

Using the block lower shift matrix 

[0030] 



f° 


0 


0 


0 


I 


0 


0 


0 


0 


I 


0 


0 


0 


0 


I 


0 



v : : : V 

where I denotes the N Q x N Q identity matrix and 0 the N Q x N Q all zero matrix, the displacement representation of A 
with respect to the displacement operators Z 



V {ZiZ} A = A-ZAZ H = 



c 0 cJ+N n 



c;cj+N_, 

c* 2 cJ+N_ 2 



c 0 c}+N, 

c;cj 



c' 0 c J 2 +N 2 
c' 2 cl 



= GJG H 



(VII) 



can be defined. 

[0031] It can be seen that ^ 22} A can be factored into a product of the form GJG H , where J = diag {±1 , ±1, ±1} 
is called the signature matrix and G is called the generator of V {Z 2} A. 



G = 



Co 


N 0 N 0 ^ 






c; 


N 2 N 0 ^ 



and 



(VIII) 





\ 


0 0 > 


J = 


0 


1 0 




1° 


0 -ij 



(IX) 
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show one factorization of V {Z 2) A. N 0 °* denotes the Cholesky factor of the inverse of N 0 . In the white noise case, Nj is 
identically zero for j * 0, and N 0 = N 0 I is a scaled identity matrix. In this case the factorization simplifies to (X) and J 
becomes an N f N Q x N f N Q identity matrix. We will assume white noise from now on for simplicity. 



G= C ' 0 (X) 
c 2 0 

: : j 

Generators are not unique. In fact, if G is a generator then G© is also a generator, provided that 0 is J-unitary, i.e. 
GJ<3^ = Ji since G©J^ H G H = GJG H . 

[0032] From now on, matrices and vectors will be treated as matrices and vectors of scalars. First, 0 shall be chosen 
such that G = G© has only one nonzero element in the first row. Any popular zeroing tool such as Givens rotation, 
fast Givens or Householder reflections may be used to find e. The column with the nonzero element in its first row is 
called the pivoting column g pvt . 

[0033] Since only g pvt has a nonzero first element, g pvl determines the first row and the first column of V| ZZ jA and 
thus also of A. g pvl is therefore the first row of the Cholesky factorization of A. Subtracting g pvt g— from A zeroes the 
first column and the first row, leaving a problem whose order is reduced by 1 . The generators G 1 or this order reduced 
problem can be computed directly from the generators of the original problem 

(a - )- z(a - g& )z H 
= fe--zgpvt--*gr.i) J fe--zg pvl -.-g r .i)r =. 



(XI) forms the basis of the displacement structure recursion step. 
[0034] Algorithm I is summarized below: 

1 . Find ©such that the first row of G multiplied by ©results in a vector with only one nonzero element 

2. Postmultiply G = G0 The column with the nonzero first element is called g pvt . 

3. Store g pvt into the appropriate column of the Cholesky Factor 

4. Premultiply the pivoting column with Z. 

5. Delete the first row of the generator. 

[0035] Since the number of columns of G and thus the size of © is independent of N f , the algorithm requires 0(N f 2 ) 
operations. 

Method according to modified displacement structure theory according to the invention 

[0036] The algorithm of the conventional displacement structure theory computes theCholesky factor. The desired 
FF filter coefficients need to be obtained by back substitution. In the following section a novel algorithm is developed, 
that directly outputs the desired FF filter coefficients 14. 

[0037] Displacement Structure Theory can be generalized to non hermitian symmetric matrices R, 



VfFLRjR = R - FiRF 2 H = g jbH 



(XII) 
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[0038] There are now two displacement operators (F 1 and F 2 ). Also : G * B in general. While G and B must have the 
same number of columns, their number of rows differ fornonsquare matrices. 

[0039] For hermitian symmetric matrices, the recursion step of the conventional displacement structure algorithm 
(XI) required the computation of a matrix 0that cleared all but one entry of the first row of G0. For the general case, 
two matrices 0 and r have to be found that clear all but one entry of the first row of both G© and Br, and for which 
GJrH = J. This problem is much more involved than the corresponding problem for the hermitian symmetric case. 
[0040] In the single output case (i.e., N D = 1 , where N D denotes the number of symbols generated by the transmitter 
per time step, also designated as "Decision Feedback Detector outputs") the block matrix 



(XIII) 



of size 2N f N Q x (N f N Q +1) consists of the N f N Q ~x N f N 0 matrix A used in the previous section, the N f N c x 1 right hand 
side of that section, an N f N Q x N f N Q negative identity matrix -I, and an N f N 0 x 1 zero vector. Now the 1, 1 -Schur 
complement 



S=0+IA 1 y 



(XIV) 



is exactly the desired solution [Schur, L: 1917, "Uber Potenzreihen, die im tnneren des Einheitskreises beschrankt 
sind". Journal fur Heine und Angewandte Mathematik 147, 205-232]. The generators for the displacement structure 
representation of the 1 , 1 -Schur complement Vj F1 F2 jS can be found by running the recursion N f N Q times. But since 
S is an N f N Q x 1 vector, F 1 SF 2 — 0, and therefore the generators directly represent the desired solution S. 
[0041] Now F 1 and F 2 need to be found that lead to low rank generators. Suitable displacement operators 



1° z, 



(XV.1) 



and 




(XV.2) 



lead to low rank generators 
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10 



15 



c; 
c; 
c; 



o 

0 



N 0 I 

0 

0 



0 

1 



N 0 
0 



c; 




c; 


0 


c; 


0 




0 


1 


0 



= GB H 



(XVI) 



20 



[0042] Again there is some flexibility in choosing the generators. A remarkable fact about the generators in (XVI) is 
however that the first N f N Q lines of G and B are equal. There are two implications of this: 

- The general problem of finding © and T that zeros all but one element in the first row of G and B and that satisfies 
©Jr = J reduces to the much simpler problem of finding a J unitary matrix that zeros all but one element in the 
first row of G 



25 - The number of rows that need to be stored and multiplied by © is reduced 
[0043] Algorithm II is summarized below: 

1 . Find ©such that the first row of G (which equals the first row of B) multiplied by © results in a vector with only 
30 one nonzero element 

2. Postmultiply G = G© and B = BQ.The column with the nonzero first element of G is called g pvt , the column 
with the nonzero first element of B is called b^. 

35 3. Store g pvt into the appropriate column of the Cholesky Factor (if the Cholesky Factor is explicitly needed) 

4. Premultiply g pvt with F 1 as defined by equation (XV.1 ) and premultiply b pvt with F 2 as defined by equation (XV.2) 

5. Delete the first row of the generator matrices G and B . G and B become G and B of the next iteration. 

40 

6. Repeat steps 1 - 5 N f N Q times 

7. Multiply the remaining N D rows of B with the remainder of G to obtain the displacement representation of the 
FF filter coefficients 1 4, or, in the single output (N D = 1 ) case, the FF filter coefficients 14 itself. 

45 

[0044] It is to be noted, that in step 2 the first N f N Q - i rows of G and B are equal, where i is the iteration number 0 < 
i < N f N Q -1 . Therefore, significant number of computations can be saved by performing the multiplications of these rows 
by ©only once. 

50 VLSI architecture of circuit arrangement 



General architecture 



[0045] The algorithm II based on a modified displacement structure theory as described above naturally maps onto 
55 a linear chain of N f N Q processing elements 21 , 22, 23 (Figure 3) . The 2N f N c + 1 rows of G and B enter the first processing 
element 21 row wise. At the output of the last processing element 23, the last remaining row of B is latched and 
multiplied by the remaining rows of G. 

[0046] Each PE (processing element) 21 , 22, 23 performs one order-reduction step, namely the determination of © 



EP 1 248 425 A1 



on entering of the first row, postmultiplication of al! entering rows with 0, and premultiplication of the pivoting columns 
with F 1 and F 2 , which amounts in delaying the pivoting column by N Q rows and zeroing certain elements of the pivoting 
column. 



G 2 



(XVII) 



[0047] It is beneficial to feed the rows of both generator matrices G and B in the order given by (XVII) from top to 
bottom. G 1 and B 1 are the rows of G and B that are equal, B 2 is the remaining row of B and G 2 consists of the remaining 
NjN 0 rows of G. That way, the single remaining row of B, which must be multiplied/accumulated with all remaining 
rows of G, leaves the last PE 23 first. 

[0048] A slight modification of the architecture in Figure 3 is depicted in Figure 4. Here, another slightly modified 
processing element 34 (PE#N f N D ) is appended at the end of the chain of PE's 31, 32, 33, 34. This last PE 34 only 
performs the © related operations. It does not apply the displacement operator F nor does it delete the first row. 
[0049] This last PE 34 zeros all but one element of the remaining row of B, leaving only one nonzero element whose 
imaginary part is also zeroed. Thus, the r complex multiplications, where r denotes the number of columns of G and 
B required in Figure 3 are now reduced to a single multiplication by a real value per FF filter coefficient 14. This 
multiplication does not even need to be performed; it only scales the FF filter coefficients 1 4 and thus also the FB fitter 
coefficients 16 and the bias a, by a constant. For the binary antipodal constellation, whose decision device is scaling 
invariant, only the sign of a has to be retained and combined with the output of the decision device by an exclusive 
OR gate. In order to avoid increasing the dynamic range of the DFE filter sections, it is however desirable to normalize 
the FF filter coefficients 14 to fall within a predefined range, e.g. by using a barrel shifter. 

[0050] As mentioned before, a new computation can be started while the current one is still in flight. This capability 
is often not necessary. But since all PE's 41 , 42, 43 perform the same computation, the number of PE's 41 , 42, 43 may 
be reduced and the data may be cycled multiple times through a (shorter) chain of PE's 41 , 42, 43. Figure 5 illustrates 
this idea. 

[0051] The FIFO 45 is only necessary if N PE L PE <2N f N 0 + 1, where N PE denotes the number of PE's 41 , 42, 43 
and L PE the latency of each PE 41 , 42, 43. On the other hand, if N PE L PE > 2N f N Q + 1 , then the utilization of the PE's 
may be increased at the expense of an increased latency. 



The PE's 



[0052] The main purpose of each PE is to find a suitable J unitary matrix 0 that zeroes ail but one element in the 
first generator row and then multiply the whole generator with 0. 

[0053] The case where G G R «^ will be discussed first, with the necessary extensions for G G c N,N °" f presented 
afterwards. 

[0054] Instead of solving the whole problem at the same time, a divide and conquer approach shall be chosen. The 
matrix © is split into a number of simpler matrices, i.e. © = ,6^— Each of these simpler submatrices only deals 
with two columns of the generator, while leaving all other columns unaffected. 

[0055] The purpose now of these smaller matrices Q is to zero the first element of one of the two columns it proc- 
esses. The condition is that © must be Jj unitary, i.e. ©Jj© H = J v where Jj is the diagonal signature matrix consisting 
of the two entries from the signature matrix J corresponding to the two columns selected. 

[0056] Two cases need to be distinguished, namely the case when the diagonal entries of Jj have the same sign and 
the case where these elements have different sign. In the former, case, © becomes an angular rotation matrix, and 
in the latter case, © is a hyperbolic rotation matrix. 

[0057] The standard tool for computing angular and hyperbolic rotations efficiently in VLSI is the CORDIC technique 
[Voider, J. E., 1959: The CORDIC Trigonometric Computing Technique; IRE Transactions on Electronic Computers EC- 
6, 330-334. Walther, J. S, 1971: An Unified Algorithm for Elementary Functions; Proceedings Spring Joint Computer 
Conference, Vol. 38. p. 397.]. Upon entering of the first row, a rotation angle has to be determined that zeros one of 
the two elements. This can be achieved with a CORDIC circuit (also designated as CORDIC cell, CORDIC block or 
CORDIC processor) in vectoring mode. Subsequent rows have to be rotated by the determined angle. This can be 
achieved with a CORDIC circuit in rotation mode. This is an elegant result because the same hardware can be used 
for both tasks, namely finding © and multiplying the generator rows with ©. The CORDIC cell just needs to be switched 
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into vectoring mode for the first generator row and then switched back into rotation mode. Note that the angle does 
not need to be computed explicitly. It is sufficient to store the direction of each CORDIC microrotation. 
[0058] PE's for complex numbers are similar to those for real numbers. There are two modifications: 

5 1. r angular CORDIC processors are used to make all r elements of the first row real (i.e. make their imaginary 

part vanishing); 

2. for each CORDIC processor for the 0> matrices, there is asecond CORDIC processor working always in rotation 
mode and processing the imaginary parts slaved to the CORDIC processor for the real part. 

10 

[0059] Fig. 6 depicts a schematic diagram of two CORDIC blocks of a PE 50, i.e. a master CORDIC block 51 (also 

designated as upper CORDIC block 51) and a slave CORDIC block 52 (also designated as lower CORDIC block 52). 

Each CORDIC block consists of multiple microrotation slices 53, 54, 55 and 56, 57, 58 (also designated as microrota- 

tions). The microrotations 53, 54, 55, 56, 57, 58 may be implemented in a pipelined way, as shown, or executed serially 
is on the same microrotation hardware. The signal "FIRST ROW" is active when the first generator row is fed into the 

CORDIC block. It switches the upper or master CORDIC block 51 into vectoring mode. If inactive, the master CORDIC 

block 51 operates in rotation mode. In vectoring mode, the master CORDIC block 51 zeros its lower output. 

[0060] Hyperbolic CORDIC blocks need to swap their inputs if the magnitude of the upper input of the first row is 

smaller than the magnitude of the lower input to ensure convergence. 
20 [0061] The complex Equalizers employ "stacked" CORDIC blocks, depicted with an arrow from the upper to the lower 

one. The lower or slave CORDIC block 52 always operates in rotating mode with the rotation directions supplied by 

the master CORDIC block 51 . 

[0062] Fig. 7 depicts a schematic diagram of the proposed microrotation 53 (also designated as CORDIC microro- 
tation circuit 53). The circuit 53 computes one microrotation per clock. The microrotations of a CORDIC block can 

25 either be computed sequentially on a single microrotation circuit, or they can be computed using a chain of as many 
microrotation circuits as there are microrotations, possibly with pipeline registers in between. In the former case, the 
shifter can be realized with a barrel shifter and the microrotation direction storage with an array of latches or a small. 
RAM block. In the latter case, the shifters can be realized with wiring, because the shift count is constant. The following 
truth table (table 1) indicates whether the Adder/Subtractors add or subtract the shifter output to or from the input, 

30 depending on the rotation direction signal DIR: 



Table 1 : 



truth table for microrotation circuit of Fig. 7 




angular 


CORDIC 


hyperbolic 


CORDIC 


DIR 


0 


1 


0 


1 


Upper ADD/SUB 


+ 






+ 


Upper ADD/SUB 








+ 



40 

[0063] The domain of convergence is limited, though. Angular CORDIC converges if the angle of the input vector in 
the cartesian plane lies within - ± 1 .74 radians. There are, however, two possible choices of one that results in the 
first element of the pivoting column to be positive and another one that results in a negative element. Instead of using 
a prerotation stage which rotates the input into the domain of convergence, it is possible to choose the Q\ that lies in 

45 the domain of convergence. That is the purpose of the XOR gate in Fig. 7. 

[0064] For hyperbolic rotations, there are additional problems. If the magnitude of both input operands are approx- 
imately the same lal « Ibl, then Q->±~. The generator columns whose signature matrix entry has the same sign can 
be grouped together. Angular CORDIC rotations may be used within both groups to zero all but one element of the 
first row in each group. Only one hyperbolic CORDIC rotator is then required to zero the single nonzero element in the 

so first row of the column group having a-1 signature matrix entry. If both first row inputs to this single CORDIC rotator 
have approximately the same magnitude, then the corresponding diagonal element of the Cholesky factorL is close to 
zero, resulting in at least one very large FF filter coefficient. For well behaved problems, this does not happen. 
[0065] Furthermore, in hyperbolic mode, the magnitude of the input operands determine which one gets zeroed, 
namely the operand with the smaller magnitude. This can be circumvented by a stage in front of the hyperbolic CORDIC 

55 circuitry that swaps the columns if the magnitude of the first element of the column to be zeroed is bigger than the 
magnitude of the first element of the pivoting column. 

[0066] Figures 8 through 11 illustrate the PE structure for N D = 1 equalizers for white and colored noise using real 
and complex numbers. The constant multipliers cancel the gain introduced by the CORDIC blocks. The constants given 
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are approximate; they depend on the number of microrotations performed by the CORDIC blocks. For the hyperbolic 
processors, the first stage (shift = 1 ) is assumed to be executed twice to improve precision when the magnitude of both 
input signals is approximately the same. The muttiplexer(s) and register(s) implement the multiplication of the pivoting 
column with the displacement operator. g f denotes the i-th column of the generator, g, is the pivoting column, and 
gi ,m denote the real and imaginary part of the i-th column, respectively. 

[0067] Fig. 8 depicts a schematic diagram of a PE 61 for N D = 1 , N Q = 2, real numbers and white noise. The constant 
coefficient multipliers cancel the gain of the CORDIC blocks. The registers and the multiplexer perform the premulti- 
plication of the pivoting column with the displacement operators. g 1f g 2 and g 3 denote the generator columns. 
[0068] Fig. 9 depicts a schematic diagram of a PE 62 for N D = 1, N Q = 2, real numbers and colored noise. The 
constant coefficient multipliers cancel the gain of the CORDIC blocks. The registers and the multiplexer perform the 
premultiplication of the pivoting column with the displacement operators. g 1 ... g 5 denote the generator columns. 
[0069] Fig. 1 0 depicts a schematic diagram of a PE 63 for N D = 1 , N 0 = 1 , complex numbers and white noise. The 
constant coefficient multipliers cancel the gain of the CORDIC blocks. The registers and the multiplexer perform the 
premultiplication of the pivoting column with the displacement operators, g^ e , gj™, g2 Re and g 2 ,m denote the real and 
imaginary parts of the generator columns. 

[0070] Fig. 11 depicts a schematic diagram of a PE 64 for N D = 1 , N Q = 1 , complex numbers and colored noise. The 
constant coefficient multipliers cancel the gain of the CORDIC blocks. The registers and the multiplexer perform the 
premultiplication of the pivoting column with the displacement operators. g^ G , g A im ... g 3 Re and g 3 lm denote the real 
and imaginary parts of the generator columns. 

[0071] Fig. 12 shows a detailed diagram of a PE 65 for the N D =1 Equalizer for complex numbers and white noise 
using pipelined CORDIC blocks. 

[0072] The hardware complexity can be reduced further at the expense of the number of clock cycles required for 
the computation. Instead of using a PE that operates on all columns simultaneously, it is possible to use a PE 70 that 
operates only on a lower number of columns per pass, thus requiring multiple passes per iteration. Fig. 13 illustrates 
this idea. Multiplying the pivoting column with the displacement operator must be performed only during the last pass; 
the two registers may therefore be bypassed. 

Examples for Hiperlan 

[0073] Hereinafter two examples of circuit arrangements are presented, which are well suited for the HIPERLAN I 
system [Radio Equipment and Systems (RES), 1995: High PERformance Radio Local Area Network (HIPERLAN), 
Type 1 functional specification; Technical Report ETS 300 652, European Telecommunications Standards Institute 
(ETSI)l 

[0074] The HIPERLAN I specification requires a terminal to start transmitting the response to a packet received within 
512 bit periods, i.e. 25.6 u.s. The FF filter 4 can for example be implemented by four multipliers operating at 60 MHz, 
three times the HIPERLAN I bit rate. The FB filter 6 consists of only N b = N f - 1 = 11 Adder/Subtractors, since dj G 
{±1}. It is the computation of the DFE coefficients that is generally believed to be a problem. 

[0075] A symbol spaced DFE 2 with twelve FF taps is assumed, i.e. N D = 1 and N f = 12. This equalizer 2 performs 
well for typical indoor channels (50ns delay spread), and experiences only minor degradation for bad channels (150ns 
delay spread). Furthermore, fully pipelined CORDIC elements are assumed, each CORDIC block executes eight mi- 
crorotations and a data path width of twelve bits is assumed. 

[0076] The computer simulations of Fig. 1 4 show that no further reduction in residual ISI energy at the decision point 
can be achieved by increasing the number of microrotations. Fig. 15 contains a computer simulation plot of the residual 
ISI energy versus wordiength, assuming a perfect receiver automatic gain control (AGC). The infinite wordiength per- 
formance is reached at a datapath width of 8 Bits, and choosing 12 Bits results in approximately 20dB margin for 
imperfect AGC. 

[0077] Since the PE's 81 , 82, 91 have four CORDIC blocks and the deletion of a row in series, the total PE latency 
is 17 clocks. The generator rows pass through N f N Q + 1 = 13 PE's. Table 2 summarizes the PE's 81, 82, 91. 



Table 2: 



Summary of processing elements 


# of CORDIC blocks 


4 


# of microrotations 


8 


Total PE latency 


1 7 clocks 


# of adder/subtractors 


64 


# of wordiength sized registers 


66 



EP 1 248 425 A1 



In determining the implementation complexity the overall control circuitry, the microrotation direction register and XOR 
gate, the zeroing gates, and the constant coefficient multipliers have been neglected, because their size is small com- 
pared to the CORDIC datapath. The constant coefficient multipliers from several or all PE's can be combined and can 
be implemented with few shift/adds. Shifts can be inserted to prevent excessive growth of the signal magnitude. 
5 [0078] Fig. 16 shows the minimum latency solution. It uses a chain of two PE's 81 , 82 and 6# passes through the 
chain. No FIFO is necessary. 

[0079] The implementation complexity can be halved by using only one PE 91 . This solution is depicted in Fig. 1 7. 
Now a FIFO 95 is necessary. It is assumed that the FIFO 95 latency can be varied from 1 to 9. 
[0080] The implementation numbers (area and power/energy consumption) include the CORDIC datapath only. The 
10 controller, the FIFO, and the multipliers are neglected. The constant coefficient multipliers that cancel the gain of the 
CORDIC blocks have been rearranged so that only one nontrivial constant coefficient multiplier remains in the output 
circuitry. The two variable x variable multipliers in the output circuitry do not need to be exact with respect to the common 
argument. It is sufficient to use an exponent detector and two shifters instead. 

[0081 ] Table 3 lists area and power consumption data of some cells from the AMS 0.35um standard cell library [AMS 
15 Austria Mtkro Systeme International AG, 0.35 Micron Standard Cell 3.3V Databook - 0.35^im CMOS Digital Core Cells 
3.3V. April 2000. httpjlasic.amsint.comfdatabookstcsx33lcbref 1 x drive strength gates are used because the data- 
path cell have a fan out of only one or two and the distances are small, leading to small capacitive loading. The power 
consumpticn numbers include a cell load capacity of 30f F. A ripple carry adder/subtractor may be built from a 2-input 
XOR grtic nnd a lull adder cell per bit. The shifter uses a 8:1 multiplexer per bit. 

20 

Table 3: 



AMS 0.35ujn standard cell library data 


Gate 


Description 


Area u.m 2 


Power uA/WMHz 


E01 


2-input XOR (1x) 


146 


0.883 


FA1 


Full-Adder (1x) 


364 


1.632 


MU8 


8:1 Multiplexer (1x) 


619 


1.318 


DFS8 


Scan D.Type Flip-Flop (1x) 


382 


1.933 



[0082] Tabic 4 lists the area and power consumption of wordlength sized components needed in the DFE computation • 
datapath. 



Table 4: 



35 


Area an power consumption of datapath components using the AMS 0.35ujti standard cell process 


Component 


Area um 2 


Power |xW/M Hz 


Prop. Delay ns 




Adder/Sublractor 


6120 


30.180 


6.5 




Shifter 


7428 


15.816 


1 


40 


Register 


4584 


23.196 


1.2 



[0083] Table 5 lists the summary of two above mentioned HIPERLAN architectures as depicted in Figures 16 and 
17. The silicon area and power/energy consumption numbers are approximate and derived from the above mentioned 
from the AMS 0.35um standard cell library. An activity factor of 100% and a load of 30fF is assumed. 

45 

Table 5: 



summary of two HIPERLAN architectures as depicted in Figures 16 and 17 




Minimum Latency 


Single PE 


Unit 


Figure 


16 


17 




PE 


2 


1 




CORDIC blocks 


8 


4 




Latency 


221 


270 


clocks 


Adder/Subtractors 


128 


64 


wordlength sized 


Registers 


132 


66 


wordlength sized 


Maximum FIFO depth 




9 
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Table 5: (continued) 



summary of two HIPERLAN architectures as depicted in Figures 16 and 17 




Minimum Latency 


Single PE 


Unit 


Silicon cell area 


1.39 


0.69 


mm 2 


Power consumption 


6.9 


3.5 


mW/MHz 


Energy consumption 


1.5 


0.9 


u.J/computation 



10 [0084] The results from table 5 indicate that the circuit arrangement according to the invention is well suited for 
HIPERLAN. The two proposed architectures are more than twice as fast and require more than ten times less silicon 
area and energy than similar known architectures for the same purpose. The invention is therefore well suited to cost 
and power constrained terminals. 

[0085] To summarize it can be stated that the invention provides a method and a circuit for equalizing a receive signal 
/ 5 in a digital receiver with the aid of a DFE structure, said method and circuit enabling to improve performance and/or 
reduce computational complexity in the receiver. 



Claims 

20 

1 . Method for equalizing a receive signal in a digital receiver with the aid of a DFE (decision feedback equalizer) (2) 
including one or several FF (feed-forward) filters (4) and one or several FB (feed-back) filters (6), the method 
including a process based on displacement structure theory for solving a set of equations defining optimum filter 
coefficients for the DFE (2) in terms of a CIR (channel impulse repose) of the receiver, characterized in that for 

25 performing said process first the FF filter coefficients (1 4) are computed and then the FB filter coefficients (1 6) are 

computed by convolving the FF filter coefficients (14) with the CIR. 

2. Method according to claim 1 , characterized in that the set of equations is defined as follows: 

30 Ax - y (11.1) 





+ N(0-0> 


c 


0 c^ + N {W)) 


CqC^ + N(2-0) 






+ N (0 . 1) 




+ c 0 Co~ + IM (VI) 


c 1 c 2 +c 0 c 1 -t-N^) 






+ N (0 _ a 




+ c;cJ+N (1 _ 2) 
















*•> 



(1 1. 2) 



45 



X = 



V 2 



v. *o J 



50 



(III) 



55 
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y = 



(IV) 



where fj denotes the FF filter coefficients (14) for 0<j<N f -1 , N f denotes the number of FF filter coefficients (1 4), Cj 
= (Cj<°) ... Cj( N 0 _1 )) T denotes the channel impulse response coefficients, i denotes a discrete time index, N Q denotes 
number of channel outputs, n t = (n^ 0 ) ... n^O" 1 )) 7 denotes the added noise vector, EJ denotes the expectation 
operator, Nj.j = E[rij*nj T ] denotes the noise, which is assumed to be stationary and A is a block matrix of size N f x 
N f , where the blocks are of size N 0 x N Q . 

Method according to claim 2, characterized in that the process for solving the set of equations comprises the 
steps of defining a block matrix R of size 2N f N Q x (N f N Q +1 ) and determining its 1 ,1 -Schur complement as follows: 



R = 



(A y) 



(XIII) 



S=0+IA y 

where -I is a negative identity matrix of size N f N Q xN f N 0 , and 0 is a zero vector of size N f N 0 x 1 . 



(XIV) 



Method according to claim 3, characterized in that the process for solving the set of equations comprises the 
step of finding displacement operators F 1? F 2 and generators G and B such, that 



V {F1.F2) R = R " F 1 RF 2 H = gjbH 



(VII) 



where J = diag {±1 ; ±1, ...,±1}. 

Method according to claim 4, characterized in that the displacement operators F 1 , F 2 and generators G and B 
are determined as follows: 



-GB H = 



( c* 


Vn7i 


o; 


0 




0 




0 




1 


0 






0 


0 



( c* 


V^ 1 


c; 


0 


c; 


0 




0 



o ) 



(XVI) 



V 
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(XV.1) 




9 



(XV.2) 



(0 



0 
0 

I 

0 



0 0 

0 0 

0 0 

1 0 



z = 



0 



(VI) 



0 



V 



where I denotes the N Q x N Q identity matrix and 0 the N Q x N Q all zero matrix. 

Method according to any one of the preceding claims, characterized in that the process for solving the set of 
equations comprises the steps of: 

1 . finding ©such that the first row of G multiplied by © results in a vector with only one nonzero element; 



3. if the Cholesky Factor is explicitly needed, then storing g pvt into the appropriate column of the Cholesky 
Factor; 

4. premultiplying g pvt with F t as defined by equation (XV.1) and premultiplying b pvt with F 2 as defined by 
equation (XV.2); 

5. deleting the first row of the generator matrices G and B followed by selecting G and B as G and B of the 
next iteration; 

6. repeating steps 1 - 5 N f N Q times; 

7. multiplying the remaining N D rows of B with the remainder of G to obtain the displacement representation 
of the FF filter coefficients (14), or, in the single output (N D = 1) case, the FF filter coefficients (14) itself. 

Method according to claim 6, characterized in that in step 2 the first N f N Q - i rows of G and B are multiplied by © 
only once, where i is the iteration number 0 < i < N f N c -1 . 

Circuit arrangement for equalizing a receive signal in a digital receiver, with a DFE (decision feedback equalizer) 
(2) including one or several FF (feed-forward) filters (4) and one or several FB (feed-back) filters (6), and circuit 
means for performing a process based on displacement structure theory for solving a set of equations defining 
optimum filter coefficients for the DFE (2) in terms of a CIR (channel impulse repose) of the receiver, characterized 
in that said circuit means are configured and arranged such that they first compute the FF filter coefficients (14) 
and then compute the FB filter coefficients (16) by convolving the FF filter coefficients (14) with the CIR. 



2. postmultiplying G G©and B = B©and denoting the column with the nonzero first element of G as g pvt 
and the column with the nonzero first element of B as b pvt ; 
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9. Circuit arrangement preferably according to claim 8 for equalizing a receive signal in a digital receiver, with a DFE 
(decision feedback equalizer) (2) including one or several FF (feed-forward) filters (4) and one or several FB (feed- 
back) filters (6), and circuit means for performing a process for solving a set of equations defining optimum filter 

. coefficients for the DFE (2) in terms of a CI R (channel impulse repose) of the receiver, characterized in that said 
circuit means comprise a linear chain of CORDIC (Coordinate Rotation Digital Computer) type processing elements 
(21 , 22, 23, 31 , 32, 33, 34, 41 , 42, 43, 50, 61 , 62, 63, 64, 65, 70, 81 : 82, 91 ). 

10. Circuit arrangement according to claim 9, characterized in that said circuit means comprise a master CORDIC 
block (51 ) and a slave CORDIC block (52), said master CORDIC block (51 ) receiving at a control input a first status 
signal and being switched between a vectoring mode of operation and a rotation mode of operation depending on 
the first status signal, said master CORDIC block (51) producing a second status signal and transmitting it to a 
control input of the slave CORDIC block (52) which is being switched between a rotating mode of operation with 
a first direction of rotation and a rotation mode of operation with a second rotation of direction depending on the 
second status signal. 

11 . Circuit arrangement according to anyone of claims 9 or 1 0, characterized in that said circuit means further com- 
prise a FIFO memory means (45; 95) arranged in a feedback loop with respect to thelinear chain of CORDIC type 
processing elements (41 , 42, 43; 91 ). 
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